-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add an edge case test that .
matches \\u2029 and \\u2028
#35
Conversation
@@ -3454,6 +3454,36 @@ | |||
"a𐄁b" | |||
] | |||
}, | |||
{ | |||
"name": "functions, match, dot matcher on \\u2028", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to use \u2028 in the JSON document?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I understand the question. Do you mean removing the second \
? That would make the test name indistinguishable from the other one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I was talking about using \u2028
in the "document"
member.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the document it is defined as \u2028
, you can see it in the source file. But it gets replaced with the actual character when the cts.json
get compiled. I'm not sure if it would be possible to keep it as \uXXXX
in the compiled cts.json
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's terribly important to have the character escaped in the doc. The only problem I could see is potentially a particularly strict JSON parser might not be able to read it.
My implementation (which uses the .Net regex engine in an "ECMAScript" configuration) is returning the
Probably related to this. I have code that does some translation, but I don't think I did the "little bit of lookahead assertion added to remove |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After some googling I figured out how to add the lookahead exclusions, and the tests pass for me now.
Was able to fix this to get things passing in |
I-Regexp follows the XSD-2 which states that the equivalent character class for
.
is[^\r\n]
. Some programming languages (e.g. Javascript, Dart) treat.
differently, in particular it won't match Unicode chars\u2029
and\u2028
. This PR introduces a corresponding edge case test.